tl;dr

This analysis is focused on utilizing Boruta as a initial pre-filter to the covariates, to narrow the feature selection search space.

Method

Apply Boruta to each performance covariate.

## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(engagement)` instead of `engagement` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

Boruta is a feature selection algorithm based on the random forest algorithm. In the process of deciding if a feature is important or not, some features may be marked as Tentative. Sometimes increasing the maxRuns can help resolve the Tentativeness of the feature.

## [1] "Applying Boruta to  active_hours"
## [1] "Applying Boruta to  active_hours_max"
## [1] "Applying Boruta to  uri_count"
## [1] "Applying Boruta to  uri_count_max"
## [1] "Applying Boruta to  search_count"
## [1] "Applying Boruta to  search_count_max"
## [1] "Applying Boruta to  num_pages"
## [1] "Applying Boruta to  num_pages_max"
## [1] "Applying Boruta to  daily_max_tabs"
## [1] "Applying Boruta to  daily_max_tabs_max"
## [1] "Applying Boruta to  daily_unique_domains"
## [1] "Applying Boruta to  daily_unique_domains_max"
## [1] "Applying Boruta to  daily_tabs_opened"
## [1] "Applying Boruta to  daily_tabs_opened_max"

Find the top 5 ranking features per metric, and add to a list.

top5
daily_num_sessions_started
daily_num_sessions_started_max
FX_PAGE_LOAD_MS_2_PARENT
fxa_configured_False
fxa_configured_True
memory_mb
num_active_days
num_addons
num_bookmarks
profile_age
profile_age_cat
session_length
session_length_max
TIME_TO_DOM_COMPLETE_MS
TIME_TO_DOM_CONTENT_LOADED_END_MS
TIME_TO_DOM_INTERACTIVE_MS
TIME_TO_LOAD_EVENT_END_MS
TIME_TO_NON_BLANK_PAINT_MS
timezone_cat_(0,2]

Increasing to 10:

top10
country_GB
country_US
daily_num_sessions_started
daily_num_sessions_started_max
default_search_engine_other (non-bundled)
FX_PAGE_LOAD_MS_2_PARENT
fxa_configured_False
fxa_configured_True
memory_cat
memory_mb
num_active_days
num_addons
num_bookmarks
profile_age
profile_age_cat
session_length
session_length_max
startup_ms
startup_ms_max
sync_configured_False
sync_configured_True
TIME_TO_DOM_COMPLETE_MS
TIME_TO_DOM_CONTENT_LOADED_END_MS
TIME_TO_DOM_INTERACTIVE_MS
TIME_TO_LOAD_EVENT_END_MS
TIME_TO_NON_BLANK_PAINT_MS
timezone_cat_(0,2]

Equal Labels

Equalize by label, then perform the above.

## [1] "Applying Boruta to  active_hours"
## [1] "Applying Boruta to  active_hours_max"
## [1] "Applying Boruta to  uri_count"
## [1] "Applying Boruta to  uri_count_max"
## [1] "Applying Boruta to  search_count"
## [1] "Applying Boruta to  search_count_max"
## [1] "Applying Boruta to  num_pages"
## [1] "Applying Boruta to  num_pages_max"
## [1] "Applying Boruta to  daily_max_tabs"
## [1] "Applying Boruta to  daily_max_tabs_max"
## [1] "Applying Boruta to  daily_unique_domains"
## [1] "Applying Boruta to  daily_unique_domains_max"
## [1] "Applying Boruta to  daily_tabs_opened"
## [1] "Applying Boruta to  daily_tabs_opened_max"

Find the top 5 ranking features per metric, and add to a list.

top5
daily_num_sessions_started
daily_num_sessions_started_max
FX_PAGE_LOAD_MS_2_PARENT
memory_mb
num_active_days
num_addons
num_bookmarks
profile_age
profile_age_cat
session_length
session_length_max
TIME_TO_DOM_COMPLETE_MS
TIME_TO_DOM_CONTENT_LOADED_END_MS
TIME_TO_DOM_INTERACTIVE_MS
TIME_TO_LOAD_EVENT_END_MS
TIME_TO_NON_BLANK_PAINT_MS

Increasing to 10:

top10
cpu_speed_mhz
daily_num_sessions_started
daily_num_sessions_started_max
default_search_engine_other (non-bundled)
FX_PAGE_LOAD_MS_2_PARENT
memory_mb
num_active_days
num_addons
num_bookmarks
profile_age
profile_age_cat
session_length
session_length_max
startup_ms
startup_ms_max
TIME_TO_DOM_COMPLETE_MS
TIME_TO_DOM_CONTENT_LOADED_END_MS
TIME_TO_DOM_INTERACTIVE_MS
TIME_TO_LOAD_EVENT_END_MS
TIME_TO_NON_BLANK_PAINT_MS